ASSESSING ATTRITION BIAS 


A. INTRODUCTION 

In a randomized controlled trial (RCT), researchers use random assignment to form two (or 
more) groups of study participants that are the basis for estimating intervention effects. Carried 
out correctly, the groups formed by random assignment have similar observable and 
unobservable characteristics, allowing any differences in outcomes between the two groups to be 
attributed to the intervention alone, within a known degree of statistical precision. 

Though randomization (done correctly) results in statistically similar groups at baseline, the 
two groups also need to be equivalent at follow-up, which introduces the issue of attrition. 
Attrition occurs when an outcome is not measured for all participants who were initially assigned 
to the groups. Throughout this paper, attrition (missing outcome data) is defined to be the 
opposite of response (nonmissing outcome data). Attrition can occur for the overall sample, and 
it can differ between the groups; both aspects can affect the statistical equivalence of the groups. 
Both overall and differential attrition create potential for bias when the characteristics of sample 
members who respond in one group differ systematically from those of the members who 
respond in the other. 

To support its efforts to assess design validity, the What Works Clearinghouse (WWC) 
needs a standard by which it can assess the likelihood that findings of RCTs may be biased due 
to attrition. This paper develops the basis for the RCT attrition standard. It uses a statistical 
model to assess the extent of bias for different rates of overall and differential attrition under 
various assumptions regarding the extent to which respondent outcomes are correlated with the 
propensity to respond. The validity of these assumptions is explored using data from three past 


experimental evaluations. 


The need for a statistical model of attrition bias to inform WWC standards stems from the 
fact that other existing attrition standards are not suited to the WWC objective of assessing the 
validity of RCTs. A number of federal agencies have standards for response rates in data 
collection, but these standards are intended to be general guidelines for all data collection efforts 
and, thus, are not tailored specifically to RCTs. For example, the Office of Management and 
Budget (OMB) and the National Center for Education Statistics (NCES) have established 
response rate targets of 80% and 85%, respectively. However, these standards do not reference 
the potential effect of attrition bias within a study on the effectiveness of an intervention. Thus, 
there are no theoretical or empirical reasons that these thresholds are appropriate in assessing 
attrition within the framework of WWC standards for study designs. 

Prior to the development of the model presented in this paper, the WWC guideline that was 
in place recognized the need for limits on both overall and differential attrition. Specifically, the 
standard consisted of fixed limits for the allowable overall attrition (20%) and the allowable 
differential attrition (7%).' However, this standard lacked theoretical and empirical justification. 
Moreover, it did not recognize any possibility for a tradeoff between overall and differential 
attrition, such that a higher rate of overall attrition could be offset by a lower rate of differential 
attrition (and vice versa). These gaps underscored the need for a statistical model on which a 
standard could be based. 

In the next section, we present a framework in which both overall and differential attrition 
contribute to possible bias. Under various assumptions about tolerances for potential bias, the 
approach yields a set of attrition rates that falls within the tolerance and a set that falls outside it. 
Because different topic areas may have factors generating attrition that lead to more or less 
potential for bias, the approach allows for refinement within a review protocol that expands or 

' The principal investigator of each review could use discretion in setting an attrition standard for his or her 


topic area. 
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contracts the set of rates that yield tolerable bias. This approach is the basis on which WWC 


attrition standards are set. 


B. ATTRITION AND BIAS 

Both overall and differential attrition may bias the estimated effect of an intervention.’ 
However, the sources of attrition and their relation to outcomes can rarely be observed or known 
with confidence, which limits the extent to which attrition bias can be quantified. The approach 
here is to develop a model of attrition bias that yields potential bias under assumptions about the 
correlation between response and outcomes. This section describes the model and its key 
parameters. It goes on to identify values of the parameters that are consistent with the WWC’s 
previous standards and assesses whether those parameter values are generally consistent with 


data from three randomized trials. 


1. Model of Attrition Bias 

Attrition that arises completely at random reduces sample sizes but does not create bias. 
However, researchers rarely know whether attrition is random and not related to outcomes. 
When attrition is related to outcomes, different rates of attrition between the treatment and 
control groups can lead to biased impact estimates. Furthermore, if the relationship between 
attrition and outcomes differs between the treatment and control groups, then attrition can lead to 
bias even if the attrition rate is the same in both groups. The focus here is to specify a model 
showing how bias depends on the correlation between outcomes and attrition and the 


combination of overall and differential attrition in an RCT. 


* Throughout this paper, the word bias refers to a deviation from the true impact for the analysis sample. An 
alternative definition of bias could also include deviation from the true impact for a larger population. We focus on 
the narrower goal of achieving causal validity for the analysis sample because nearly all studies reviewed by the 
WWC involve purposeful samples of students and schools. 
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To set up the model, consider a variable representing an individual’s latent (unobserved) 
propensity to respond, z. Assume z is normally distributed with mean zero and standard deviation 


one. If the proportion of individuals who respond is P, an individual is a respondent if his or her 
value of z exceeds a threshold, z*: 

(l) z>@"(1-P)=z" 
where @® is the standard normal cumulative distribution function. For example, in a scenario in 


which 75% of individuals respond (P = 0.75), an individual is a respondent if his or her value of 


z exceeds the value corresponding to the 25th percentile in the z distribution [that is, exceeds 
®'(1-0.75)]. 

The outcome at follow-up, y, is the key variable of interest. We assume that y has a normal 
distribution. Moreover, we assume that y has mean zero and standard deviation one, given that 
any variable can be standardized in this way. The relationship between y and z can then be 
modeled as 

(2) y=a*z+u 
where @ is the correlation between z and y, and u is a random variable that is independent of z.° 
Note that there are no covariates, and the model assumes no effect of the treatment on the 
outcome. If @ is | or —1, the entire outcome is explained by the propensity to respond. If @ is 
zero, none of the outcome is explained by the propensity to respond, which is the case when 
attrition is completely random. 

The correlation between the propensity to respond and outcomes may differ by treatment 
status. Therefore, we specify Equation (2) separately for treatment and control group members 


(subscripted by ¢ and c, respectively): 


* In order for y to be a N(0,1) variable, u must be normally distributed with mean zero and standard deviation 


l-a’. 
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G3) y= a, *z,+u, 


y, =a,*z,+4u,. 
Because there is no true impact of the intervention in this model, an unbiased estimator of 
the impact should, in expectation, find no difference in outcomes between the treatment and 
control groups. Therefore, in the presence of attrition, bias is equal to the difference between the 


expected values of y, and y, among respondents. Based on the properties of truncated normal 
distributions, the analytic formula for the bias, B, is 


(4) B=E(y,|2,>2/)-EQ. |Z, > 22) 
= @,E(z,|z,>2Z;)-a@,E(z, |Z, > 22) 
_a,xg(®(1-P)) a, xg(®"(1-P)) 
P P 


f c 
where ¢ is the standard normal density function. 

Equation (4) shows that bias can be generated by treatment-control differences in the 
response rates (P and P.) or in the correlation between y and z (a, and @,). If neither P nor @ 
differs between these groups, then there is no bias because the same kinds of individuals respond 
from both groups.’ However, if response rates differ between the treatment and control groups ( 


P. #P.), then bias occurs even when a, =a@,, because respondents in the treatment and control 


groups have different average values of z and, thus, different average values of y. Moreover, if 


a,#a,, then impact estimates will be biased even if the response rate is the same in both 


groups; respondents from the two groups will have different average values of y stemming from 


the differences in a@.° 


* Those who attrite, nonetheless, will differ systematically from those who do not attrite, which may 
compromise the external validity of the study. However, we do not address that issue here. 


° It is possible that a difference in the rate of attrition between groups could offset a difference between @, and 
a. However, throughout this paper, we conservatively assume the opposite—that these differences are reinforcing, 
not offsetting. 
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2. Numeric Relationships Between Bias and Response Rates 


From Equation (4), we can map out the numeric relationship between bias (B) and response 
rates (P and P.) after setting assumptions for the correlations (a, and a.) between the 
propensity to respond and outcomes. Table 1 shows the relationship between bias and response 
rates for each of several different assumptions about a, and a. Each row of the table pertains to 
a different combination of response rates in the treatment (P ) and control (P.) groups, shown in 
the first two entries of each row. The remaining entries in each row show magnitudes of bias (in 
effect size units) resulting from these response rates, with different columns pertaining to 
different assumptions about a, and a.. Therefore, each column of the table maps out a distinct 
relationship between bias and response rates based on the assumed values of a, and a, in the 


column header. 


TABLE 1 


BIAS BY RESPONSE RATE AND CORRELATION BETWEEN OUTCOMES AND THE 
PROPENSITY TO RESPOND (EFFECT SIZE UNITS) 


We explore various possible assumptions for a, and a,. To pick values of a, and a, for 


consideration, it is convenient to note that a is equivalent to the R-squared from the regression 


shown in Equation (2)—that is, the proportion of the outcome variance that is explained by the 


propensity to respond. Therefore, we consider a range of values for a, and @, that yield a range 
of possible R-squared values in the treatment group (R; ) and control group (R° ): 

e a, =0.27 and a, =0.22 (implying R’ = 0.075 and R? = 0.05) 

° a, =0.32 and a, =0.22 (implying R? = 0.10 and R? = 0.05) 

° a, =0.39 and a, = 0.22 (implying R? = 0.15 and R? = 0.05) 

° a, =0.45 and a, = 0.39 (implying R? = 0.20 and R? = 0.15) 

° a, =0.55 and a, =0.45 (implying R? = 0.30 and R? = 0.20) 
_ = 0.71 and a, = 0.45 (implying R? = 0.50 and R? = 0.20) 


e a, =l1and a, =1 (implying R? = 1 and R? = 1) 


The key finding in Table | is that, given a set of assumptions regarding the correlation 
between outcomes and the propensity to respond, bias can be reduced by either increasing the 
overall response rate or reducing the differential response rate. For example, column 4 shows 
that an overall response rate of 60% yields a bias of no more than 0.05 only if the differential rate 
is 2 percentage points or less, but that if the overall rate is 90%, the differential rate can be as 


high as 5 percentage points. 


3. Identifying Reasonable Values for Model Parameters 


As shown in Table 1, the relationship between response rates and bias depends on the values 
of a, and a@,. In order to determine which of these relationships should be used, we must identify 
which values for these parameters are reasonable to assume. 

As the initial step toward identifying reasonable parameter assumptions, we first determine 


which of the assumptions in Table 1 are consistent with the previous WWC attrition standard. 


Suppose that the previous standard had been developed to limit the possible bias to 0.05 standard 


deviations—the tolerance level for bias that we actually select for the new attrition standard, as 
discussed later. In this case, the values of a, and qa, that are consistent with the previous 
standard are the ones for which an overall response rate of 80% and a differential response rate 
of 7 percentage points lead to a bias of 0.05. Using the row of Table 1 corresponding to these 
response rates (P = 0.765 and P. = 0.835), we see that bias is approximately equal to 0.05 (that 
is, differs from 0.05 by no more than 0.01) in the first, second, and fourth columns, which 


correspond to a, = 0.27 and @, = 0.22 (column 1), a, = 0.32 and a, = 0.22 (column 2), and a, = 
0.45 and a, = 0.39 (column 4). Therefore, those assumptions appear to be most consistent with 


the previous standard. 

Our next objective is to determine whether the assumptions that are consistent with the 
previous standard (columns 1, 2, and 4 of Table 1) are also consistent with actual data from 
randomized trials in education. In fact, from existing studies, we could directly infer @ in each 
group (treatment or control) if we could observe outcomes for both respondents and 
nonrespondents in that group. In each group, the attrition model implies a precise relationship 


between @ and the relative outcomes of respondents and nonrespondents. Let A, denote the 


difference in outcomes, in effect size units, between respondents and nonrespondents in group g 


(either the treatment [7] or control [c] group). It can be shown that 


_ et +) eX KO “O-F) 
(5a) Ag = £0, 75 > 25) Bg | 4 $72)= 5 py 


which implies that 


A P(I-P,) 
00) = 
g(D (1-F,)). 


Of course, we cannot observe outcomes for nonrespondents, so we cannot observe A, 


directly. However, in studies that have both follow-up and baseline test scores, we can use the 
baseline test scores as proxies for the follow-up test scores because baseline scores are typically 
correlated with follow-up scores. Therefore, for each of several existing studies, we use the 


difference in baseline test scores between respondents and nonrespondents as the proxy for A,, 
and we use Equation (5b) to calculate a, for g=t,c. 


Our data for these calculations come from three large-scale randomized trials conducted by 
Mathematica Policy Research for IES: 
e Evaluation of the 21‘ Century Community Learning Centers 
e Evaluation of Education Technologies in Reading and Mathematics 


e Evaluation of Supplemental Reading Comprehension Interventions 


One of these studies, the education technology study, had distinct interventions that were 
implemented in four different grade levels (first, fourth, sixth, and ninth), with random 
assignment occurring separately by grade level. Therefore, we calculated parameter values 
separately by grade level in this study. 

For each study, Table 2 presents empirical values of key quantities (namely, the response 
rate [P] and the respondent—nonrespondent difference in baseline scores [A]) used to calculate a 
as well as the resulting value of @ . Values are presented separately for the treatment and control 
groups. The studies generally had high response rates (of at least 80% in all but one case) in both 
the treatment and control groups. Effect size differences in baseline test scores between 
respondents and nonrespondents range widely across studies, from a low of 0.02 in the treatment 
group of the 21st Century evaluation to a high of 0.54 in the treatment group of the 6th grade 


education technology evaluation. 
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TABLE 2 


RESPONSE RATES, RESPONDENT—NONRESPONDENT DIFFERENCES IN BASELINE 
TEST SCORES, AND IMPLIED CORRELATIONS BETWEEN TEST SCORES AND THE 
PROPENSITY TO RESPOND FROM THREE RANDOMIZED TRIALS 


21st Century 0.81 0.02 0.01 0.83 0.10 0.06 
Education Technology 
lst Grade 0.91 0.46 0.23 0.90 0.35 0.18 
4th Grade 0.87 0.40 0.21 0.90 0.51 0.26 
6th Grade 0.88 0.54 0.28 0.90 0.44 0.23 
9th Grade 0.80 0.18 0.10 0.76 0.28 0.16 
Reading Comprehension 0.89 0.31 0.16 0.88 0.32 0.17 


For each study listed in Table 2, there are two empirical values of a@ —one for the treatment 
group and one for the control group. For the purposes of assessing whether specific assumptions 
in Table | are reasonable, it is sufficient to extract two characteristics of the values of @ from 
each study: (1) the higher of the two values of a and (2) the difference between the higher and 
lower value of @. It is irrelevant whether the higher value of @ in each study comes from the 
treatment or control group because, when generating the bias values in Table 1, we always assign 
the higher value of @ to the group with the lower response rate (which, in Table 1, is arbitrarily 
chosen to be the treatment group). This approach is conservative because it assumes that 
treatment-control differences in @ always reinforce (rather than mitigate) biases stemming from 
treatment-control differences in response rates. 

Across these studies, the higher value in each pair of @ ’s ranges from 0.06 to 0.28, with a 
mean of 0.19. In addition, the difference between the higher and lower value of a ranges from 


0.01 to 0.06, with a mean of 0.05. Of the various sets of assumptions for a, and @, represented 
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in the different columns of Table 1, the assumptions in the first column of Table 1 appear to 
approximate the empirical values of @ most closely. 

In summary, data from the three studies are consistent with low values for the correlation 
between test scores and the propensity to respond, as well as with small differences in that 
correlation between the treatment and control groups. As discussed earlier, correlations in the 
lower part of the range explored in Table 1—namely, the correlations represented by the first, 
second, and fourth columns of the table—are also consistent with the previous attrition standard. 


Taken together, these findings suggest that values of a, and @, in the range of those shown in 


the first four columns of Table 1 are reasonable assumptions. In fact, the data used in our 
analysis lean toward the first column. However, for certain populations of students not included 
in those studies, such as older students who volunteer to participate in a dropout prevention 
program, attrition may be more correlated with the outcome, in which case more conservative 


assumptions (such as those in the fourth column of Table 1) would be justified. 


4. Attrition Tradeoffs Assuming a Constant Bias 


The preceding findings enable us to map out a specific relationship between bias and 
response rates by setting a, and @, equal to reasonable values. That is, rather than having to 
consider the wide range of possible relationships between bias and response rates shown in Table 


1, we can now select a particular relationship that corresponds to our chosen values of a, and 


a: 
For the purposes of developing an attrition standard, we choose a threshold degree of 
tolerable bias. Using a selected relationship between bias and response rates from Table 1, we 


then calculate which combinations of overall and differential response imply biases that exceed 


or fall below the threshold. This approach highlights the tradeoff between overall and differential 
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attrition and can be illustrated graphically. Figure 1 uses a bias threshold of 0.05 standard 
deviations of the outcome measure. The green/bottom-left region shows combinations of overall 


and differential attrition that yield attrition bias less than 0.05 under pessimistic (but still 


reasonable) assumptions (a, =0.45 and a, =0.39), the yellow/middle region shows additional 
combinations that yield attrition bias less than 0.05 under optimistic assumptions (a, =0.27 and 


a, =0.22), and the red/top-right region shows combinations that yield bias greater than 0.05 


even under optimistic assumptions. 


FIGURE 1 


TRADEOFFS BETWEEN OVERALL AND DIFFERENTIAL ATTRITION 


Differential Attrition 
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To get some indication of how large the relative bias is, note that for a nationally normed 
test, a difference of 0.05 represents about 2 percentile points for a student at the 50th percentile. 
For example, if the reported effect suggests the intervention will move the student from the 50th 
percentile to the 60th percentile (a 0.25 effect size), the true effect may be to move the student 
from the 50th percentile to the 58th percentile (a 0.20 effect size). Doubling the tolerable bias to 
0.10 means that an intervention that reportedly moves a student from the 50th percentile to the 
60th percentile may move the student only to the 56th percentile—a scenario that seems to imply 
a fairly large bias. With these considerations, we set the threshold degree of tolerable bias to be 


0.05. 


5. Using the Attrition Bias Model to Create a Standard 
In developing the topic area review protocol, the principal investigator (PI) considers the 
types of samples and likely relationship between attrition and student outcomes for studies in the 
topic area. When a PI has reason to believe that much of the attrition is exogenous—for example, 
parent mobility with young children—more optimistic assumptions regarding the relationship 
between attrition and outcomes might be appropriate. On the other hand, when a PI has reason to 
believe that more of the attrition could be endogenous—for example, high school students 
choosing whether to participate in an intervention—more conservative assumptions may be 
appropriate. The combinations of overall and differential attrition that are acceptable given either 
optimistic or conservative assumptions are illustrated in Figure 1, and translate into evidence 
standards ratings: 
e For a study in the green/bottom-left region, attrition is expected to result in an 
acceptable level of bias even under conservative assumptions, which yields a 


rating of Meets Evidence Standards. 
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e For a study in the red/top-right region, attrition is expected to result in an 
unacceptable level of bias even under optimistic assumptions, and the study can 
receive a rating no higher than Meets Evidence Standards with Reservations, 
provided it establishes baseline equivalence of the analysis sample. 

e For a study in the yellow/middle region, the PI’s judgment about the sources of 
attrition for the topic area determines whether a study Meets Evidence Standards. 
If a PI believes that optimistic assumptions are appropriate for the topic area, 
then a study that falls in this range is treated as if it were in the green/bottom-left 
region. If a PI believes that conservative assumptions are appropriate, then a 
study that falls in this range is treated as if it were in the red area. A PI chooses 
whether the optimistic or conservative assumption applies for the review. 
However, once the assumption is chosen, it will be applied to all studies 


reviewed in that area and not vary across studies. 


When the unit of assignment differs from the unit of analysis and both types of units have 
the potential to leave the study, the attrition standard will be applied separately to (1) all units of 
assignment and (2) the units of analysis contained within any units of assignment that did not 
leave the study before follow-up data collection. The study must meet the attrition standard for 


both the units of assignment and the units of analysis in order to Meet Evidence Standards. 


ibe) 


